Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification
نویسندگان
چکیده
The paper introduces and discusses a concept of syntactic n-grams (sn-grams) that can be applied instead of traditional n-grams in many NLP tasks. Sn-grams are constructed by following paths in syntactic trees, so sngrams allow bringing syntactic knowledge into machine learning methods. Still, previous parsing is necessary for their construction. We applied sn-grams in the task of authorship attribution for corpora of three and seven authors with very promising results.
منابع مشابه
Syntactic Dependency-Based N-grams as Classification Features
In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency...
متن کاملDependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition
Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this prob lem, several lexical, syntactic and semantic based tech niques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntac tic dependency and...
متن کاملThe Benefit of Syntactic vs. Linear N-grams for Linguistic Description
Automatic dependency annotations have been used in all kinds of language applications. However, there has been much less exploitation of dependency annotations for the linguistic description of language varieties. This paper presents an attempt to employ dependency annotations for describing style. We argue that for this purpose, linear n-grams (that follow the text’s surface) alone do not appr...
متن کاملWeb-scale Surface and Syntactic n-gram Features for Dependency Parsing
We develop novel firstand second-order features for dependency parsing based on the Google Syntactic Ngrams corpus, a collection of subtree counts of parsed sentences from scanned books. We also extend previous work on surface n-gram features from Web1T to the Google Books corpus and from first-order to second-order, comparing and analysing performance over newswire and web treebanks. Surface a...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013